On Dyck languages
In this exercise we consider the Dyck language, that is the language of well-balanced parentheses (and variations of it). More precisely, given the alphabet \Sigma=\{(,)\}, the Dyck language \mathsf{dyck}(1) is \mathsf{dyck}(1) =\{w\in \Sigma^* \mid \text{for every prefix $u$ of $w$} \ \ |u|_(\geq |u|_)\land |w|_(=|w|_)\}.
Similarly, let \mathsf{dyck}(s) be the Dyck language on s pairs of parentheses, i.e. the language of correctly nested sequences of s distinct types of parentheses. For instance, given the two types of parentheses (, ), and [, ], the word ([])\in \mathsf{dyck}(2), ()[]\in \mathsf{dyck}(2) but ([)]\notin \mathsf{dyck}(2).
Show that the grammar S \to SS \mid (S) \mid \lambda generates exactly \mathsf{dyck}(1). Is it ambiguous?
Show that the grammar S \to (S)S \mid \lambda generates exactly \mathsf{dyck}(1). Is it ambiguous?
Show that the grammar S \to SS \mid (S) \mid [S] \mid\lambda generates exactly \mathsf{dyck}(2). Is it ambiguous?
Show that the grammar S \to (S)S \mid [S]S \mid \lambda generates exactly \mathsf{dyck}(2). Is it ambiguous?
Construct an unambiguous grammar that generates \mathsf{dyck}(s) for an arbitrary s.
Let \sigma=\{(,),[,]\} and L be the language of all words over \Sigma^* such that, ignoring the symbols [, ], the words are well-parenthesised on (, ), and, similarly, ignoring the symbols (, ), the words are well-parenthesised on [, ]. In particular \mathsf{dyck}(2)\subseteq L, but the containment is strict. For instance, ([)]\in L\setminus \mathsf{dyck}(2). Show that L is not a context-free language.
TipUse the fact that the language \{a^nb^nc^n \mid n\in \mathbb N\} is not context-free.
The Dyck languages are important because, in a sense, they are the most complicated context-free languages. Indeed, the Chomsky–Schützenberger representation theorem states that a language L is context-free if an only if L is the image under an homomorphism of some \mathsf{dyck}(s) intersected with a regular language.